AITopics | term candidate

Collaborating Authors

term candidate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Extracting domain-specific terms using contextual word embeddings

Repar, Andraž, Lavrač, Nada, Pollak, Senja

arXiv.org Artificial IntelligenceFeb-24-2025

Automated terminology extraction refers to the task of extracting meaningful terms from domain-specific texts. This paper proposes a novel machine learning approach to terminology extraction, which combines features from traditional term extraction systems with novel contextual features derived from contextual word embeddings. Instead of using a predefined list of part-of-speech patterns, we first analyse a new term-annotated corpus RSDO5 for the Slovenian language and devise a set of rules for term candidate selection and then generate statistical, linguistic and context-based features. We use a support-vector machine algorithm to train a classification model, evaluate it on the four domains (biomechanics, linguistics, chemistry, veterinary) of the RSDO5 corpus and compare the results with state-of-art term extraction approaches for the Slovenian language. Our approach provides significant improvements in terms of F1 score over the previous state-of-the-art, which proves that contextual word embeddings are valuable for improving term extraction.1. Introduction Automated terminology extraction (ATE) refers to the task of extracting meaningful terms from domain-specific texts. Terms are single-word (SWU) or multi-word units (MWU) of knowledge, which are relevant for a particular domain. Since manual identification of terms is costly and time consuming, ATE approaches can reduce the effort needed to generate relevant domain-specific terms. Recognizing and extracting domain-specific terms, which is useful in various fields, such as translation, dictionary creation, ontology generation and others, remains a difficult task.

corpus, extraction, frequency, (17 more...)

arXiv.org Artificial Intelligence

2502.17278

Country:

Europe > Slovenia > Gorizia > Municipality of Vipava > Vipava (0.04)
Europe > Slovenia > Gorizia > Municipality of Nova Gorica > Nova Gorica (0.04)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
Asia > Malaysia (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Add feedback

Building Large Lexicalized Ontologies from Text: a Use Case in Automatic Indexing of Biotechnology Patents

Nédellec, Claire, Golik, Wiktoria, Aubin, Sophie, Bossy, Robert

arXiv.org Artificial IntelligenceOct-2-2020

This paper presents a tool, TyDI, and methods experimented in the building of a termino-ontology, i.e. a lexicalized ontology aimed at fine-grained indexation for semantic search applications. TyDI provides facilities for knowledge engineers and domain experts to efficiently collaborate to validate, organize and conceptualize corpus extracted terms. A use case on biotechnology patent search demonstrates TyDI's potential.

artificial intelligence, information management, ontology, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-642-16438-5_41

2010.0086

Country:

Europe > Spain (0.04)
Europe > Portugal (0.04)
Europe > Hungary (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

Improving Term Extraction with Terminological Resources

Aubin, Sophie, Hamon, Thierry

arXiv.org Artificial IntelligenceDec-1-2009

Studies of different term extractors on a corpus of the biomedical domain revealed decreasing performances when applied to highly technical texts. The difficulty or impossibility of customising them to new domains is an additional limitation. In this paper, we propose to use external terminologies to influence generic linguistic data in order to augment the quality of the extraction. The tool we implemented exploits testified terms at different steps of the process: chunking, parsing and extraction of term candidates. Experiments reported here show that, using this method, more term candidates can be acquired with a higher level of reliability. We further describe the extraction process involving endogenous disambiguation implemented in the term extractor YaTeA.

artificial intelligence, natural language, term candidate, (17 more...)

arXiv.org Artificial Intelligence

cs/0609019

Country: Europe > France (0.28)

Genre:

Research Report (0.70)
Workflow (0.48)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.57)

Add feedback

Determining the Unithood of Word Sequences using Mutual Information and Independence Measure

Wong, Wilson, Liu, Wei, Bennamoun, Mohammed

arXiv.org Artificial IntelligenceOct-1-2008

Most works related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, the number of independent research that study the notion of unithood and produce dedicated techniques for measuring unithood is extremely small. We propose a new approach, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our evaluations revealed a precision and recall of 98.68% and 91.82% respectively with an accuracy at 95.42% in measuring the unithood of 1005 test cases.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

0810.0156

Country:

Asia (0.93)
Europe (0.93)
Oceania > Australia (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.91)
Information Technology > Information Management > Search (0.88)

Add feedback

Determining the Unithood of Word Sequences using a Probabilistic Approach

Wong, Wilson, Liu, Wei, Bennamoun, Mohammed

arXiv.org Artificial IntelligenceOct-1-2008

Most research related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, novelties are rare in this small sub-field of term extraction. In addition, existing work were mostly empirically motivated and derived. We propose a new probabilistically-derived measure, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our comparative study using 1,825 test cases against an existing empirically-derived function revealed an improvement in terms of precision, recall and accuracy.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

0810.0139

Country:

Europe (0.68)
Asia (0.68)
Oceania > Australia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback